Add missing INLINE's on EnumContainers #5499

ChrisPenner · 2024-12-10T23:42:54Z

Overview

I was partaking in my now daily ritual ritual of staring into the void (a.k.a. GHC Core) guess what I found!

The difference here is substantial and implies that the places we're still using Enum Containers probably require another look, or at least I should re-examine the core again to explain how there's this much of a speed difference.

It's really important to be careful about INLINE pragmas when helper methods are in a different module than they're used in, since GHC won't typically inline across modules unless asked to.

Implementation notes

Just add a bunch of INLINE annotations to EnumMap methods

Benchmarks:

trunk -> new

fib1
313.5µs -> 206.985µs

fib2
2.241831ms -> 1.820296ms

fib3
2.651751ms -> 2.060708ms

Decode Nat
341ns -> 245ns

Generate 100 random numbers
207.506µs -> 146.037µs

List.foldLeft
2.020392ms -> 1.538192ms

Count to 1 million
124.85875ms -> 80.50275ms

Json parsing (per document)
258.366µs -> 214.227µs

Count to N (per element)
190ns -> 119ns

Count to 1000
191.382µs -> 119.902µs

Mutate a Ref 1000 times
316.96µs -> 204.624µs

CAS an IO.ref 1000 times
426.228µs -> 287.614µs

List.range (per element)
326ns -> 244ns

List.range 0 1000
345.768µs -> 255.853µs

Set.fromList (range 0 1000)
1.584278ms -> 1.194793ms

Map.fromList (range 0 1000)
1.160257ms -> 845.532µs

NatMap.fromList (range 0 1000)
4.869621ms -> 3.453725ms

Map.lookup (1k element map)
2.539µs -> 1.709µs

Map.insert (1k element map)
6.829µs -> 4.899µs

List.at (1k element list)
286ns -> 188ns

Text.split /
35.598µs -> 26.241µs

ChrisPenner · 2024-12-12T00:44:54Z

From staring at the core I was able to determine that the difference is actually because CCache is not getting unboxed on this branch.

Something about inlining these convinced GHC NOT to unbox the CCache, and it improves the speed by not passing the extra dozen args to the eval worker.

Dan and I confirmed we get a similar speedup by just removing the !s on CCache in eval and exec.

That said, this PR should probably go in anyways since there's no reason to NOT inline these, and clearly it's helping GHC be at least a little smarter :)

The interpreter is much faster when it doesn't get unboxed.

Add missing INLINE's on EnumContainers

c5f5cce

ChrisPenner marked this pull request as ready for review December 11, 2024 04:32

ChrisPenner requested a review from dolio December 11, 2024 17:46

dolio approved these changes Dec 12, 2024

View reviewed changes

Remove strictness annotations on CCache

5adb267

The interpreter is much faster when it doesn't get unboxed.

ChrisPenner merged commit f9cc70e into trunk Dec 12, 2024
32 checks passed

ChrisPenner deleted the cp/inline-ec branch December 12, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing INLINE's on EnumContainers #5499

Add missing INLINE's on EnumContainers #5499

ChrisPenner commented Dec 10, 2024

ChrisPenner commented Dec 12, 2024

Add missing INLINE's on EnumContainers #5499

Add missing INLINE's on EnumContainers #5499

Conversation

ChrisPenner commented Dec 10, 2024

Overview

Implementation notes

ChrisPenner commented Dec 12, 2024